Multi-Modal Scene Interpretation
نویسندگان
چکیده
The visionary goal of developing an easy to use service robot implies several key tasks such as speech understanding, object recognition and scene understanding. Besides the more sensor-oriented capabilities such systems need extensive meta knowledge, e.g., about mental representations of spatial relations to match the view between man and machine. Only if all parts fit together an unrestricted man machine communication can be established. Therefore a cognitive system has to address many different parts that have to be integrated, in the technical sense and especially in the cognition models [1]. Especially when connecting a perceptive component with a spatial reasoning component using a speech recognition and synthesis component, the probabilistic area of object recognition has to be coupled with the logical area of formal reasoning. The cognitive vision system ORCC presented here combines diverse recognition strategies that afford an extensive description of an unreserved scene: In a first step the room demarcations and structurally simple objects such as tables are extracted using as well functional as structural properties. Then further objects are segmented based on their position, followed by a structurally more complex and a more shapeoriented recognition step. Then, this spatial information is enriched with colour-based information about the objects. Afterwards, the resulting scene description can be used as an input for a speech-based man-machine dialogue about the objects within in the scene [2].
منابع مشابه
Multi-modal Data Fusion Techniques and Applications
In recent years, camera networks have been widely employed in several application domains such as surveillance, ambient intelligence or video conferencing. The integration of heterogeneous sensors can provide complementary and redundant information that fused to visual cues allows the system to obtain an enriched and more robust scene interpretation. A discussion about possible architectures an...
متن کاملGrouping Over Stereo for Visual Cues Disambiguation
In stereo–vision, the goal is to reconstruct the three–dimensional structure of the scene observed from two camera inputs. The core problems are the matching of features into both camera frames, and the interpretation of image features in terms of the 3D scene. In this paper, we use a rating scheme of the potential correspondences, based on the multi–modal intrinsic similarity of the features. ...
متن کاملCross-calibration of time-of-flight and colour cameras
Time-of-flight cameras provide depth information, which is complementary to the photometric appearance of the scene in ordinary images. It is desirable to merge the depth and colour information, in order to obtain a coherent scene representation. However, the individual cameras will have different viewpoints, resolutions and fields of view, which means that they must be mutually calibrated. Thi...
متن کاملMulti-modal Auto-Encoders as Joint Estimators for Robotics Scene Understanding
We explore the capabilities of Auto-Encoders to fuse the information available from cameras and depth sensors, and to reconstruct missing data, for scene understanding tasks. In particular we consider three input modalities: RGB images; depth images; and semantic label information. We seek to generate complete scene segmentations and depth maps, given images and partial and/or noisy depth and s...
متن کاملAnalyzing the Affect of a Group of People Using Multi-modal Framework
Millions of images on the web enable us to explore images from social events such as a family party, thus it is of interest to understand and model the affect exhibited by a group of people in images. But analysis of the affect expressed by multiple people is challenging due to varied indoor and outdoor settings, and interactions taking place between various numbers of people. A few existing wo...
متن کاملUnifying Registration and Segmentation for Multi-sensor Images
We propose a method for unifying registration and segmentation of multi-modal images assuming that the hidden scene model is a Gibbs probability distribution.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- KI
دوره 22 شماره
صفحات -
تاریخ انتشار 2008